RNNs provide a way for sequential data to be captured in a meaningful way. Specifically, RNNs that are created using LSTM cells enable both long and short term behavior of a sequence to be taken into account when predicting the next value in a sequence. Unlike a conventional feed forward neural network, RNNs have the capacity to deal with input data that don't have fixed shapes or lengths. The example in this notebook looks at generating music that is similar to a given set of input songs. It builds on the activity outlined in the first Lab session of MIT's Introduction to Deep Learning course (6.S191).
LSTM cells are gated cells that have a cell state and a hidden state. The cell state typically acts like a selective memory value that can keep track of information even if it was first recorded in an input several time steps prior. Information in the cell state persists from one input iteration to the next until the network deems there is a reason to 'forget' certain information. Conversely, the hidden state of a cell preserves a more holistic picture of the inputs seen by the network and is more sensitive to new inputs. The functions of both states are also closely linked to how they are updated. Applying gradient descent by backpropogating through time to update parameters that are used to calculate hidden states involves matrix multiplication. This results in the 'vaninshing gradient' problem as gradient magnitudes diminish as backpropogation goes further back in time. Conversely, the update proceedure for cell states does not invlove matrix multiplication so there is no loss in resolution as parameters are updated back through time.
! pip install regex
!apt-get install abcmidi timidity > /dev/null 2>&1
import tensorflow as tf
tf.enable_eager_execution()
import numpy as np
import functools
import regex as re
import os
import urllib.request
import time
urllib.request.urlretrieve('https://raw.githubusercontent.com/ksureshprojects/introtodeeplearning_labs_python3/master/__init__.py', 'util.py')
import util
print('tf version: {}'.format(tf.__version__))
is_correct_tf_version = '1.14.0' in tf.__version__
assert is_correct_tf_version, "Wrong tensorflow version ({}) installed".format(tf.__version__)
is_eager_enabled = tf.executing_eagerly()
assert is_eager_enabled, "Tensorflow eager mode is not enabled"
# Check gpu is available in runtime
assert tf.test.is_gpu_available()
The data used to train our music generating model is a 'abc' file containing several irish tunes. The 'abc' file format provides a specification to represent musical tunes in plain text.
path_to_file = tf.keras.utils.get_file('irish.abc', 'https://raw.githubusercontent.com/aamini/introtodeeplearning_labs/2019/lab1/data/irish.abc')
text = open(path_to_file).read()
# length of text is the number of characters in it
print ('Length of text: {} characters'.format(len(text)))
Each tune specified in a file has a set of headers that describe certain features of the tune such as composer name, time signature, and key. These headers are then followed by the musical notes and rests that actually make up the tune.
print(text[:504])
The following functions are required to validate the format of tunes specified in an abc file and then play each tune.
file = open('abc2wav','a')
cmd = 'abcfile=$1\nsuffix=${abcfile%.abc}\nabc2midi $abcfile -o \"$suffix.mid\"\ntimidity \"$suffix.mid\" -Ow \"$suffix.wav\"\nrm \"$suffix.abc\" \"$suffix.mid\"'
file.write(cmd)
file.close()
! chmod +x abc2wav
def extract_song_snippet(generated_text):
pattern = '\n\n(.*?)\n\n'
search_results = re.findall(pattern, generated_text, overlapped=True, flags=re.DOTALL)
songs = [song for song in search_results]
print("Found {} possible songs in generated texts".format(len(songs)))
return songs
def save_song_to_abc(song, filename="tmp"):
save_name = "{}.abc".format(filename)
with open(save_name, "w") as f:
f.write(song)
return filename
def abc2wav(abc_file):
path_to_tool = './abc2wav'
cmd = "{} {}".format(path_to_tool, abc_file)
return os.system(cmd)
def play_wav(wav_file):
from IPython.display import Audio
return Audio(wav_file)
def play_generated_song(generated_text):
songs = extract_song_snippet(generated_text)
if len(songs) == 0:
print("No valid songs found in generated text. Try training the model longer or increasing the amount of generated music to ensure complete songs are generated!")
for song in songs:
basename = save_song_to_abc(song)
ret = abc2wav(basename+'.abc')
if ret == 0: #succeeded
return play_wav(basename+'.wav')
print("None of the songs were valid, try training longer to improve syntax.")
play_generated_song(text)